Update src\llamafactory\train\sft\metric.py #4877
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do?
The input parameters of rouge/bleu are optimized. Added the ability to evaluate English data.
Rouge and bleu scores can now be evaluated more accurately, while word segmentation is automatically selected based on Chinese and English data sets.
For Chinese data
The
ComputeSimilarity
class inmetric.py
seems to be designed specifically for Chinese data sets.In my opinion, there are two problems with the current evaluation code for Chinese data.
sentence_bleu
is a list of individual Chinese characters. It is better to use Chinese words in practice.For example:
sentence_bleu
should preferably be['你好', '世界']
For English data
In addition, the current code has some problems in estimating English. This is due to jieba word segmentation and other reasons.
For example, the current argument passed to
sentence_bleu
is a list of English letters instead of words, which does not conform to the official standard usage: nltk/translate/bleu_scoreSo I added code to support English data evaluation.
Fixes # (issue)
Before submitting